A Comparative Evaluation of Collocation Extraction Techniques
نویسنده
چکیده
Abstract This paper describes an experiment that attempts to compare a range of existing collocation extraction techniques as well as the implementation of a new technique based on tests for lexical substitutability. After a description of the experiment details, the techniques are discussed with particular emphasis on any adaptations that are required in order to evaluate it in the way proposed. This is followed by a discussion on the relative strengths and weaknesses of the techniques with reference to the results obtained. Since there is no general agreement on the exact nature of collocation, evaluating techniques with reference to any single standard is somewhat controversial. Departing from this point, part of the concluding discussion includes initial proposals for a common framework for evaluation of collocation extraction techniques.
منابع مشابه
Accurate Collocation Extraction Using a Multilingual Parser
This paper focuses on the use of advanced techniques of text analysis as support for collocation extraction. A hybrid system is presented that combines statistical methods and multilingual parsing for detecting accurate collocational information from English, French, Spanish and Italian corpora. The advantage of relying on full parsing over using a traditional window method (which ignores the s...
متن کاملInduction of Syntactic Collocation Patterns from Generic Syntactic Relations
Syntactic configurations used in collocation extraction are highly divergent from one system to another, this questioning the validity of results and making comparative evaluation difficult. We describe a corpus-driven approach for inferring an exhaustive set of configurations from actual data by finding, with a parser, all the productive syntactic associations, then by appealing to human exper...
متن کاملComparative Evaluation of Collocation Extraction Metrics
Corpus-based automatic extraction of collocations is typically carried out employing some statistic indicating concurrency in order to identify words that co-occur more often than expected by chance. In this paper we are concerned with some typical measures such as the t-score, Pearson’s χ-square test, log-likelihood ratio, pointwise mutual information and a novel information theoretic measure,...
متن کاملAn Extensive Empirical Study of Collocation Extraction Methods
This paper presents a status quo of an ongoing research study of collocations – an essential linguistic phenomenon having a wide spectrum of applications in the field of natural language processing. The core of the work is an empirical evaluation of a comprehensive list of automatic collocation extraction methods using precision-recall measures and a proposal of a new approach integrating multi...
متن کاملIdentification of Multiwords as Preprocessing for Automatic Extraction of Lexical Similarities
Previous approaches on automatic extraction of lexical similarities have considered as semantic unit of text the word. However, the theoretical perspective of contextual lexical semantics suggests that larger segments of text, specifically non-compositional multiwords, are more appropriate for this role. We experimentally tested the applicability of this notion, applying automatic collocation e...
متن کامل